alternative aggregation PoC #603

finiteprods · 2020-11-13T10:18:10Z

Warning. The changes here are meant just for illustrative PoC purposes, and should not be merged.

This branch contains some test code to prove that it is possible, in principle, to support an alternative aggregation to "federated averaging" in the PET protocol. The particular example illustrated here is a histogram aggregation. A quick breakdown of the experiment:

the coordinator and participants are assumed to be in agreement about the ranges of values in the histogram, e.g. 0 - 5, 5 - 10, 10 - 15, 15 - 20.
the test-drive is used to simulate clients providing a "measurement" in one of those ranges, e.g. for a measurement of 7.5, this falls into the 5 - 10 range. To convey this range to the coordinator, it would construct a "model" [0, 1, 0, 0].
the PET protocol runs as usual (not completely true - explained later), so that the coordinator learns none of the individual models, but still unmasks their overall sum. From this, the coordinator has the histogram.

In a particular test run of this, spinning up a coordinator and running the test-drive with -n 10, I observed

7 update participants computed a masked model. Of these:
2 sent [1, 0, 0, 0]
3 sent [0, 1, 0, 0]
2 sent [0, 0, 0, 1]

On the coordinator, the unmasked model and histogram is visible in the console output:

histogram for [2, 3, 0, 2]

    *        
*   *       *
*   *       *
*************

Further remarks.

with some refinements to the histogram protocol above (not shown here), it is possible to compute other kinds of aggregations, such as a maximum or minimum. This is still done in a privacy-preserving way (whether these cover a reasonable amount of our mobile analytics use cases, is still to be investigated).
how one parametrises the aggregation, e.g. switch between averaging or histogram or something else, is something to be considered elsewhere.
the test-drive still uses dummy models (just in this slightly different form). In the real implementation, a client would construct a model based on some value accessible on the device.
as mentioned above, every party was assumed to know the ranges of the histogram upfront. In practice, this would need to be communicated somehow - clients need this information to compute their models.
a small modification was made to the unmasking part of the PET protocol - we skip the "correction" step which re-scales the unmasked vector. The reason is because we would like a straightforward sum of the models, rather than a weighted average.
(minor point) to print the mini-histogram, I used a tiny crate hist. While the above output looks sensible, my mileage varied a lot! it behaves a little peculiarly, sometimes adding / removing a point. It's not documented so perhaps I'm misusing it.

finiteprods · 2020-12-14T09:37:10Z

closing as this is superseded by #635

mini histogram eg

6f362f3

finiteprods closed this Dec 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alternative aggregation PoC #603

alternative aggregation PoC #603

Uh oh!

finiteprods commented Nov 13, 2020 •

edited

Loading

Uh oh!

finiteprods commented Dec 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alternative aggregation PoC #603

alternative aggregation PoC #603

Uh oh!

Conversation

finiteprods commented Nov 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

finiteprods commented Dec 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

finiteprods commented Nov 13, 2020 •

edited

Loading